Romanian TimeBank: An Annotated Parallel Corpus for Temporal Information

نویسندگان

  • Corina Forascu
  • Dan Tufis
چکیده

The paper describes the main steps for the construction, annotation and validation of the Romanian version of the TimeBank corpus. Starting from the English TimeBank corpus – the reference annotated corpus in the temporal domain, we have translated all the 183 English news texts into Romanian and mapped the English annotations onto Romanian, with a success rate of 96.53%. Based on ISO-Time the emerging standard for representing temporal information, which includes many of the previous annotations schemes -, we have evaluated the automatic transfer onto Romanian and, and, when necessary, corrected the Romanian annotations so that in the end we obtained a 99.18% transfer rate for the TimeML annotations. In very few cases, due to language peculiarities, some original annotations could not be transferred. For the portability of the temporal annotation standard to Romanian, we suggested some additions for the ISO-Time standard, concerning especially the EVENT tag, based on linguistic evidence, the Romanian grammar, and also on the localisations of TimeML to other Romance languages. Future improvements to the Ro-TimeBank will take into consideration all temporal expressions, signals and events in texts, even those with a not very clear temporal anchoring.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Semi-automatic Annotation of the Romanian TimeBank 1.2

Temporal information is essential in many Natural Language Processing applications. The paper presents our activities and research towards obtaining a parallel version, EnglishRomanian, of the TimeBank 1.2. corpus, annotated according to the TimeML 1.2.1. standard. The automatic import from English to Romanian can be done automatically for 96.53% of the temporal markups; it will determine the a...

متن کامل

TRIOS-TimeBank Corpus: Extended TimeBank Corpus with Help of Deep Understanding of Text

TimeBank (Pustejovsky et al, 2003a), a reference for TimeML (Pustejovsky et al, 2003b) compliant annotation, is widely used temporally annotated corpus in the community. It captures time expressions, events, and relations between events and event and temporal expression; but there is room for improvements in this hand-annotated widely used TimeBank corpus. This work is one such effort to extend...

متن کامل

GMT to +2 or how can TimeML be used in Romanian

The paper describes the construction and usage of the Romanian version of the TimeBank corpus. The success rate of 96.53% for the automatic import of the temporal annotation from English to Romanian shows that the automatic transfer is a worth doing enterprise if temporality is to be studied in another language than the one for which TimeML, the annotation standard used, was developed. A prelim...

متن کامل

Machine Learning Approaches for Temporal Information Extraction: A comparative study

Temporal expressions are important structures in natural language. In order to understand text, temporal expressions have to be extracted and normalized to ISO-based values. For these purposes rule-based and machine learning techniques were proposed. In this paper we present and compare two approaches for automatic recognition of temporal expressions in free text, based on a supervised machine ...

متن کامل

Annotating Events, Temporal Expressions and Relations in Italian: the It-Timeml Experience for the Ita-TimeBank

This paper presents the annotation guidelines and specifications which have been developed for the creation of the Italian TimeBank, a language resource composed of two corpora manually annotated with temporal and event information. In particular, the adaptation of the TimeML scheme to Italian is described, and a special attention is given to the methodology used for the realization of the anno...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012